Recognizing activities with cluster-trees of tracklets

نویسندگان

  • Adrien Gaidon
  • Zaïd Harchaoui
  • Cordelia Schmid
چکیده

Although the structure of simple actions can be captured by rigid grids [3] or by sequences of short temporal parts [2], activities are composed of a variable number of sub-events connected by more complex spatio-temporal relations. In this paper, we learn how to automatically represent activities as a hierarchy of mid-level motion components in order to improve activity classification in real-world videos. This hierarchy is a video-specific, data-driven decomposition obtained by clustering tracklets, i.e. local point trajectories of a fixed small duration. Our first contribution is a hierarchical spectral clustering algorithm, based on top-down recursive bi-partitioning. We propose to robustly threshold tracklet projections on an approximate spectral embedding by minimizing a spatio-temporal connectivity cost. This allows for clusters of arbitrary shape and an efficient greedy splitting strategy that automatically determines the number of clusters. The resulting hierarchical decomposition provides structural information relating motion parts together. Our second contribution is the use of this entire tree structure, called cluster-tree (c.f. Figure 1), in order to build a hierarchical model of the motion content of a video. We introduce a corresponding tree representation of actions, called BOF-tree. The BOF-tree of a video has the same structure as its cluster-tree and each node is modeled by a bag-offeatures (BOF) over the MBH descriptors [6] of its constitutive tracklets. Efficiently using this structural information is challenging as cluster-trees have a variable number of nodes and a structure specific to each video. Furthermore, there is no natural left-to-right ordering of the two children of a parent node. Therefore, we introduce an efficient positive definite kernel — called the “All Tree Edge Pairs” (ATEP) kernel — that computes the structural and visual similarity of two hierarchical decompositions by relying on models of their parent-child relations as described in the following. Let Ti = (Vi,Ei), i ∈ {1,2}, be two BOF-trees, defined from their set of vertices (nodes) Vi and directed edges (parent-child relations) Ei. Each node v ∈ Vi is represented by a BOF — noted b[v] — over its constitutive tracklets. We model a directed edge e = (vp,vc) ∈ Ei by the concatenation — noted b[e] = (b[vp],b[vc]) — of the BOF of the child node vc with the BOF of its parent node vp. Let h be a kernel between BOF, ri ∈ Vi be the root of Ti, and wr ∈ (0,1) a cross-validated parameter encoding a prior on the importance of the root-to-root comparisons. Our ATEP kernel is defined as:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GAIDON et al.: RECOGNIZING ACTIVITIES WITH CLUSTER-TREES OF TRACKLETS 1 Recognizing activities with cluster-trees of tracklets

We address the problem of recognizing complex activities, such as pole vaulting, which are characterized by the composition of a large and variable number of different spatio-temporal parts. We represent a video as a hierarchy of mid-level motion components. This hierarchy is a data-driven decomposition specific to each video. We introduce a divisive clustering algorithm that can efficiently ex...

متن کامل

Tracklet clustering for robust multiple object tracking using distance dependent Chinese restaurant processes

To contrive an accurate and efficient strategy for object detection–object track assignment problem, we present a tracklet clustering approach using distance dependent Chinese restaurant processes (ddCRPs), which employ a two-level robust object tracker. The first level is an ordinary tracklet generator that obtains short yet reliable tracklets. In the second level, we cluster the tracklets ove...

متن کامل

Recognising complex activities with histograms of relative tracklets

One approach to the recognition of complex human activities is to use feature descriptors that encode visual interactions by describing properties of local visual features with respect to trajectories of tracked objects. We explore an example of such an approach in which dense tracklets are described relative to multiple reference trajectories, providing a rich representation of complex interac...

متن کامل

Semantic Analysis for Crowded Scenes Based on Non-Parametric Tracklet Clustering

In this paper we address the problem of semantic analysis of structured/unstructured crowded video scenes. Our proposed approach relies on tracklets for motion representation. Each extracted tracklet is abstracted as a directed line segment, and a novel tracklet similarity measure is formulated based on line geometry. For analysis, we apply non-parametric clustering on the extracted tracklets. ...

متن کامل

Temporally Coherent CRP: A Bayesian Non-Parametric Approach for Clustering Tracklets with applications to Person Discovery in Videos

Tracklet Clustering is central to several Computer vision tasks [17][20]. A video can be represented as a sequence of tracklets, each spanning over 10-20 successive video frames, and each tracklet is associated with one entity (eg. person in case of TV-serial videos). Tracklets are instances of data-types exhibiting rich spatio-temporal structure. Existing approaches model tracklets by deployin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012